Time Series Features

Lecture 4

What is a time series “feature,” and why are features useful?

A feature is a single number that summarizes one property of a time series.

Rather than staring at 500 individual time plots, features let us compare many series at once by reducing each to a small set of descriptive statistics.

Examples: the mean, the variance, the lag-1 autocorrelation, the strength of trend, the strength of seasonality.

In fpp3, features(data, variable, feature_functions) computes a table of features for one or many series simultaneously. This is the basis for large-scale automated forecasting.

Simple statistical features

Location: mean, median, quantiles.

The mean is the most common summary, but the median is more robust to outliers.
Quantiles describe the distribution of values across time.

Spread: variance, standard deviation, IQR.

A series with high variance relative to its mean is harder to forecast precisely.

Shape: skewness, kurtosis.

Positive skew means occasional large upward spikes (e.g., storm damage claims).
High kurtosis means fat tails — extreme values occur more often than a normal distribution would predict.

In R: features(data, variable, list(mean, var, quantile)).

Or use the pre-built feat_acf, feat_stl shortcuts.

ACF-based features quantify the autocorrelation structure of a series.

acf1

Lag-1 autocorrelation. High values indicate strong short-run persistence.

acf10

Sum of squares of the first 10 autocorrelations. Captures overall autocorrelation magnitude.

diff1_acf1

Lag-1 ACF of the first-differenced series. Useful for detecting over-differencing.

season_acf1

ACF at the seasonal lag. A large value confirms strong seasonality.

In fpp3: features(data, variable, feat_acf).

How do we measure the strength of trend and seasonality in a single number?

STL decomposition yields two powerful features: trend strength and seasonal strength.

Trend strength compares the variance of the remainder to the variance of the deseasonalized series:

F_T = max (0, 1 − Var(R_t) / Var(T_t + R_t))

Seasonal strength compares the variance of the remainder to the variance of the seasonally adjusted series:

F_S = max (0, 1 − Var(R_t) / Var(S_t + R_t))

Both measures lie in [0, 1]. Values near 1 mean nearly all variation is explained by that component; near 0 means it is absent. In fpp3: feat_stl.

Other useful features

Spectral entropy — measures forecastability.

Based on the spectral density of the series. Values near 0 = highly forecastable (strong patterns). Values near 1 = close to white noise (hard to forecast).
Useful for ranking a large collection of series by difficulty.

Number of peaks and troughs in the ACF.

Captures whether the ACF oscillates (seasonal or cyclical) or simply decays.

Unit root test statistics (KPSS, ADF).

Indicate whether the series is stationary or needs differencing.
Used automatically by unitroot_ndiffs() to choose the order of differencing.

What does it mean for a time series to be stationary?

A stationary series has statistical properties that do not change over time.

Formally, a series is (weakly) stationary if:

The mean is constant: E[y_t] = μ for all t.
The variance is constant: Var(y_t) = σ² for all t.
The autocovariance between y_t and y_t−k depends only on the lag k, not on t.

Why it matters: most time series models (ARIMA, in particular) require stationarity. A trended or heteroskedastic series must be transformed before fitting these models.

Common fix: first differencing removes a trend. Taking logs before differencing handles growing variance.

Differencing to achieve stationarity

First difference: Δy_t = y_t − y_t−1

Removes a linear trend. A series that requires one difference is called I(1) (integrated of order 1).
Most macroeconomic series are I(1): GDP, employment, prices.

Second difference: Δ²y_t = Δy_t − Δy_t−1

Removes a quadratic trend. Rarely needed beyond two differences.

Seasonal difference: Δ_my_t = y_t − y_t−m

Removes a stable seasonal pattern by subtracting the value from one full season ago.
Often combined with a regular first difference: difference seasonally, then difference again.

Unit root tests formally test whether differencing is needed.

KPSS test (Kwiatkowski–Phillips–Schmidt–Shin): tests the null that the series is stationary. A small p-value rejects stationarity → difference the series.

ADF test (Augmented Dickey–Fuller): tests the null that the series has a unit root (is non-stationary). A small p-value rejects the unit root → no differencing needed.

In fpp3, unitroot_ndiffs() combines these tests to automatically select the number of regular differences needed, and unitroot_nsdiffs() selects the number of seasonal differences.

Note: always confirm the automatic suggestion with a time plot and ACF. Automated tests can fail on short or irregular series.

A scatterplot matrix reveals relationships among multiple time series at once.

When you have several related variables (e.g., sales across regions, or GDP, inflation, and unemployment), a scatterplot matrix plots each pair of variables against each other in a grid.

What to look for:

Linear vs. non-linear relationships between variables.
Positive or negative co-movement (potential explanatory power for regression).
Outliers that appear in some pairs but not others.
Near-identical series (possible multicollinearity if used as predictors).

In R: GGally::ggpairs() or a custom loop with ggplot2.

Using features to analyze many series at once

Companies often need to forecast thousands of series simultaneously.

A retailer may forecast sales for 50,000 SKUs across 300 stores.
Manually inspecting each series is impossible. Features make it tractable.

PCA on features reduces to two dimensions for visualization.

Each series becomes a point in feature space. Clusters of similar series often share the same best model.
Outlier series (unusual feature values) can be flagged for manual review.

Features also support model selection at scale.

Series with high trend strength and low seasonal strength → use an ETS(A,A,N) or ARIMA with drift.
Series with both high trend and high seasonal strength → use ETS(A,A,A) or seasonal ARIMA.

The features() function is the workhorse for feature-based analysis.

Common usage patterns:

          # All fpp3 features in one call

          data |> features(variable, feature_set(pkgs = "feasts"))

          # ACF features only

          data |> features(variable, feat_acf)

          # STL features: trend + seasonal strength

          data |> features(variable, feat_stl)

The result is a tibble with one row per series and one column per feature — ready for plotting, clustering, or model selection logic.

High autocorrelation can be misleading when the series has a trend or seasonality.

A strongly trended series will show high positive autocorrelation at all lags — not because the past truly predicts the future, but because both y_t and y_t−k are near the same part of the trend.

This is called spurious autocorrelation. It inflates ACF values and can make a series look much more forecastable than it really is.

Fix: always assess the ACF of a stationary series. Difference (or detrend and deseasonalize) before computing the ACF if the series is non-stationary.

The ACF of first-differenced data answers the real question: does today’s change predict tomorrow’s change?

Chapter 4 in summary

Features compress a time series into interpretable scalars.

Simple stats, ACF features, STL features, spectral entropy, unit root test statistics.

Trend strength and seasonal strength (from STL) are the most practically useful.

They guide model choice and flag series that may need special treatment.

Stationarity is a prerequisite for ARIMA modelling.

Use unit root tests and ACF inspection to decide how many differences are needed.

At scale, features enable automated, data-driven forecasting pipelines.

PCA and clustering on features reveal structure invisible in individual time plots.

Practice Questions

Question 1 of 4